NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

AutoAI2C: An Automated Hardware Generator for DNN Acceleration on Both FPGA and ASIC

https://doi.org/10.1109/TCAD.2024.3393428

Zhang, Yongan; Zhang, Xiaofan; Xu, Pengfei; Zhao, Yang; Hao, Cong; Chen, Deming; Lin, Yingyan Celine (October 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
SmartDeal: Remodeling Deep Network Weights for Efficient Inference and Training

https://doi.org/10.1109/TNNLS.2021.3138056

Chen, Xiaohan; Zhao, Yang; Wang, Yue; Xu, Pengfei; You, Haoran; Li, Chaojian; Fu, Yonggan; Lin, Yingyan; Wang, Zhangyang (March 2022, IEEE Transactions on Neural Networks and Learning Systems)

The record-breaking performance of deep neural networks (DNNs) comes with heavy parameter budgets, which leads to external dynamic random access memory (DRAM) for storage. The prohibitive energy of DRAM accesses makes it nontrivial for DNN deployment on resource-constrained devices, calling for minimizing the movements of weights and data in order to improve the energy efficiency. Driven by this critical bottleneck, we present SmartDeal, a hardware-friendly algorithm framework to trade higher-cost memory storage/access for lower-cost computation, in order to aggressively boost the storage and energy efficiency, for both DNN inference and training. The core technique of SmartDeal is a novel DNN weight matrix decomposition framework with respective structural constraints on each matrix factor, carefully crafted to unleash the hardware-aware efficiency potential. Specifically, we decompose each weight tensor as the product of a small basis matrix and a large structurally sparse coefficient matrix whose nonzero elements are readily quantized to the power-of-2. The resulting sparse and readily quantized DNNs enjoy greatly reduced energy consumption in data movement as well as weight storage, while incurring minimal overhead to recover the original weights thanks to the required sparse bit-operations and cost-favorable computations. Beyond inference, we take another leap to embrace energy-efficient training, by introducing several customized techniques to address the unique roadblocks arising in training while preserving the SmartDeal structures. We also design a dedicated hardware accelerator to fully utilize the new weight structure to improve the real energy efficiency and latency performance. We conduct experiments on both vision and language tasks, with nine models, four datasets, and three settings (inference-only, adaptation, and fine-tuning). Our extensive results show that 1) being applied to inference, SmartDeal achieves up to 2.44x improvement in energy efficiency as evaluated using real hardware implementations and 2) being applied to training, SmartDeal can lead to 10.56x and 4.48x reduction in the storage and the training energy cost, respectively, with usually negligible accuracy loss, compared to state-of-the-art training baselines. Our source codes are available at: https://github.com/VITA-Group/SmartDeal.
more » « less
Full Text Available
Thermosetting supramolecular polymerization of compartmentalized DNA fibers with stereo sequence and length control

https://doi.org/10.1016/j.chempr.2021.05.022

Dore, Michael D.; Trinh, Tuan; Zorman, Marlo; de Rochambeau, Donatien; Platnich, Casey M.; Xu, Pengfei; Luo, Xin; Remington, Jacob M.; Toader, Violeta; Cosa, Gonzalo; et al (September 2021, Chem)

Full Text Available
Timely: Pushing Data Movements And Interfaces In Pim Accelerators Towards Local And In Time Domain

https://doi.org/10.1109/ISCA45697.2020.00073

Li, Weitao; Xu, Pengfei; Zhao, Yang; Li, Haitong; Xie, Yuan; Lin, Yingyan (May 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA))

Full Text Available
E2-train: Training state-of-the-art CNNs with over 80% energy savings

Wang, Yue; Jiang, Ziyu; Chen, Xiaohan; Xu, Pengfei; Zhao, Yang; Lin, Yingyan; Wang, Zhangyang (December 2019, Advances in Neural Information Processing Systems 32 (NIPS 2019))

Convolutional neural networks (CNNs) have been increasingly deployed to edge devices. Hence, many efforts have been made towards efficient CNN inference on resource-constrained platforms. This paper attempts to explore an orthogonal direction: how to conduct more energy-efficient training of CNNs, so as to enable on-device training? We strive to reduce the energy cost during training, by dropping unnecessary computations, from three complementary levels: stochastic mini-batch dropping on the data level; selective layer update on the model level; and sign prediction for low-cost, low-precision back-propagation, on the algorithm level. Extensive simulations and ablation studies, with real energy measurements from an FPGA board, confirm the superiority of our proposed strategies and demonstrate remarkable energy savings for training. For example, when training ResNet-74 on CIFAR-10, we achieve aggressive energy savings of >90% and >60%, while incurring a top-1 accuracy loss of only about 2% and 1.2%, respectively. When training ResNet-110 on CIFAR-100, an over 84% training energy saving is achieved without degrading inference accuracy.
more » « less
Full Text Available
Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks

You, Haoran; Li, Chaojian; Xu, Pengfei; Fu, Yonggan; Wang, Yue; Chen, Xiaohan; Baraniuk, Richard G.; Wang, Zhangyang; Lin, Yingyan (May 2020, International Conference on Learning Representations, 2019)

(Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations. However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits. In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as Early-Bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early. Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low computational overhead, without needing to know the true winning tickets that emerge after the full training. Finally, we leverage the existence of EB tickets and the proposed mask distance to develop efficient training methods, which are achieved by first identifying EB tickets via low-cost schemes, and then continuing to train merely the EB tickets towards the target accuracy. Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets and the effectiveness of mask distance in efficiently identifying them; and 2) that the proposed efficient training via EB tickets can achieve up to 5.8x ~ 10.7x energy savings while maintaining comparable or even better accuracy as compared to the most competitive state-of-the-art training methods, demonstrating a promising and easily adopted method for tackling cost-prohibitive deep network training.
more » « less
Full Text Available
Dual Dynamic Inference: Enabling More Efficient, Adaptive and Controllable Deep Inference

https://doi.org/10.1109/JSTSP.2020.2979669

Wang, Yue; Shen, Jianghao; Hu, Ting-Kuei; Xu, Pengfei; Nguyen, Tan; Baraniuk, Richard G.; Wang, Zhangyang; Lin, Yingyan (March 2020, IEEE Journal of Selected Topics in Signal Processing)

Full Text Available
AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

https://doi.org/10.1145/3373087.3375306

Xu, Pengfei; Zhao, Yang; hao, Cong; Zhang, Xiaofan; Guan, Zetong; Zhang, Yongan; Wang, Yue; Chen, Deming; Lin, Yingyan (January 2020, 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2020))

Full Text Available
Drawing Early-bird Tickets: Towards More Efficient Training of Deep Networks

You, Haoran; Li, Chaojian; Xu, Pengfei; Fu, Yonggan; Wang, Yue; Chen, Xiaohan; Wang, Zhangyang; Baraniuk, Richard G.; Lin, Yingyan (January 2020, International Conference on Learning Representations 2020 (ICLR 2020))

Full Text Available
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings

Wang, Yue; Jiang, Ziyu; Chen, Xiaohan; Xu, Pengfei; Zhao, Yang; Lin, Yingyan; Wang, Atlas (January 2019, Thirty-third Conference on Neural Information Processing Systems(NeurIPS 2019))

Full Text Available

« Prev Next »

Search for: All records